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Abstract 

Background: Wheat leaf rust (Puccinia triticina Eriks; Pt) and stem rust fungi (P. graminis f.sp. tritici; Pgt) are 
significant economic pathogens having similar host ranges and life cycles, but different alternate hosts. The 
Pt genome, currently estimated at 135 Mb, is significantly larger than Pgt, at 88 Mb, but the reason for the 
expansion is unknown. Three genomic loci of Pt conserved proteins were characterized to gain insight into gene 
content, genome complexity and expansion. 

Results: A bacterial artificial chromosome (BAC) library was made from P. triticina race 1, BBBD and probed with 
Pt homologs of genes encoding two predicted Pgt secreted effectors and a DNA marker mapping to a region of 
avirulence. Three BACs, 103 Kb, 112 Kb, and 166 Kb, were sequenced, assembled, and open reading frames were 
identified. Orthologous genes were identified in Pgt and local conservation of gene order (microsynteny) was 
observed. Pairwise protein identities ranged from 26 to 99%. One Pt BAC, containing a RAD18 ortholog, shares 
syntenic regions with two Pgt scaffolds, which could represent both haplotypes of Pgt. Gene sequence is diverged 
between the species as well as within the two haplotypes. In all three BAC clones, gene order is locally conserved, 
however, gene shuffling has occurred relative to Pgt. These regions are further diverged by differing insertion loci of 
LTR-retrotransposon, Gypsy, Copia, Mutator, and Harbinger mobile elements. Uncharacterized Pt open reading frames 
were also found; these proteins are high in lysine and similar to multiple proteins in Pgt. 

Conclusions: The three Pt loci are conserved in gene order, with a range of gene sequence divergence. 
Conservation of predicted haustoria expressed secreted protein genes between Pt and Pgt is extended to the more 
distant poplar rust, Melampsora larici-populina. The loci also reveal that genome expansion in Pt is in part due to 
higher occurrence of repeat-elements in this species. 
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Background 

Plants and pathogens are in a constant struggle as each 
co-evolves to adapt to genomic changes. Plant genomes 
are adapting to different modes of infection by pathogens 
while pathogens are evolving different avenues to circum- 
vent defense systems of their respective hosts. Rust fungi 
are among the most economically important pathogens, 
yet are part of elusive host-pathogen systems. The order 
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Pucciniales (formerly Uredinales or Urediniomycetes) 
contains over 7,000 different species from 100 genera [1]. 
Adding to the complexity, individual cereal crops can be 
infected by several rust fungi adapted to the specific crop. 

Cereal rust fungi are obligate biotrophs and have 
alternate hosts where sexual recombination takes place, 
allowing for diversification of the population [2]. The life 
cycle of cereal rust fungi begins with a urediniospore 
landing on a leaf surface and germinating in the presence 
of adequate humidity. A germtube emerges and moves 
towards a stomate via a thigmotrophic response and 
probable chemical clues [3] where an appressorium will 
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form. A hypha grows inside the substomatal space until a 
mesophyll cell is encountered. The fungus will penetrate 
the cell wall and produce a haustorium by invagination of 
the plasma membrane [4,5] At each stage of infection, the 
fungus is postulated to secrete effectors to inhibit cell 
defenses and reprogram cells to redirect nutrients. 
Though some candidate effectors are shared among the 
rust fungi, most are specific to their host and include 
transcription factors, zinc finger proteins, small secreted 
proteins and cysteine-rich proteins [6]. Certain classes of 
effectors, such as ones modulating host immunity, are 
believed to rapidly change to overcome resistance, 
however, the mechanisms generating this variation are 
not known. In several studied pathogens, certain classes 
of predicted effectors are found in variable and highly 
mutagable regions of the genome. Mobile elements 
induced mutations in effectors in Phytophthora [7], 
Magnaporthe [8], and Leptosphaeria [9] while Fusarium 
oxysporum has a specialized chromosome with effectors 
[10,11]. Effectors can be clustered in the genome (Ustilago; 
[12]) including at telomeres (Fusarium, [13]; Magnaporthe; 
[14]). Avirulence genes from the flax rust fungus, 
Melampsora lini are all small secreted proteins [15,16]. 
Currently, two effectors have been identified in uredinios- 
pores of Puccinia graminis f.sp. tritici (Pgt) that induce the 
in vivo phosphorylation and degradation of the barley 
resistance protein, RPG1 [17]. 

Sequencing technology has made significant advance- 
ments in recent years. Complete genomes of more species, 
including fungi, are being sequenced. Comprehensive 
catalogs of genes can be generated, annotated, and 
comparisons made to other genomes. Core sets of 
genes needed for function, adaptations for life cycle, 
and host specificity can now be found. Comparisons of 
several obligate fungal plant parasites have identified 
common losses of genes involved in nitrate and sulfur 
metabolism [6,18]. Melampsora larici-populina (Mlp) 
and Pgt have approximately 8,000 orthologous genes 
which could be suggested as a core set needed for bio- 
trophism. However, 74% and 84% of the secreted proteins, 
respectively, are lineage specific [6] suggesting proteins 
that are needed for the individual life cycle. Corn patho- 
gens, U. maydis and S. reilianum are also closely related 
and share 71% of effector genes in so-called divergence 
clusters. However, 10% are U. maydis specific while 
19% are specific to S. reilianum [19]. 

Puccinia triticina (Pt) is the causal agent of wheat leaf 
rust and new races emerge each year aided by a crop 
monoculture placing a strong selection pressure on the 
pathogen. Genetic variation is generally believed to 
increase through sexual recombination to generate new 
allele combinations. Two related wheat rust fungi, Pgt 
and P. striiformis f.sp. tritici {Pst), causing stripe rust, 
have a sexual cycle on North American Berberis spp. and 



have a greater race variability where the alternate host is 
present [20]. The Pt aeciospore stage is on Thalictrum 
spp. and Isopyrum, found mainly in the Mediterranean 
region, however, other Thalictrum species present in 
North America can support a reduced level of infection 
[21] but are generally resistant to Pt [22,23]. Populations 
are essentially asexual; supported by the lack of recombin- 
ation found in numerous North American races [24-28]. A 
parasexual cycle may exist allowing recombination since 
germtube fusion, nuclear migration, and bridging structures 
between nuclei have been observed in Pt [29] . 

The obligate biotrophic nature of cereal rusts makes 
experimental manipulation difficult, however, genomics 
provides a means of studying evolution and gene 
function. We set out to understand the genome variation 
of two rust fungi at three regions. A Pt bacterial artificial 
chromosome (BAC) library was made and clones were 
identified using three probes that would isolate regions 
of predicted secreted proteins and avirulence. Sequenced 
DNA regions of Pt were compared to syntenic regions in 
two rust species with complete genome sequences, Pgt, 
and Mlp [6], and evaluated for genomic conservation, 
expansion and mutations. 

Results 

BAC library construction 

Urediniospores harbor two haploid nuclei with an esti- 
mated total genome complexity for Pt of approximately 
135 Mb, based on comparative DNA fluorescence 
(L. Szabo, unpublished) and the current total size of 
the genome assembly (http://www.broadinstitute.org/ 
annotation/genome/puccinia_group/ GenomeStats.html) . 
The generated P. triticina BAC library contained 15,360 
clones arrayed in 384 well plates with an average insert 
size of 105 kb representing an estimated 10 to 12 genome 
equivalents. A single-copy probe identified nine positive 
clones on high density filters, and assuming fragments 
were randomly cloned during library construction, this is 
in agreement with the estimated genome coverage. 

BAC clone selection, sequencing and characterization 

Three genomic regions were targeted for comparison. 
Previous work had mapped a Pgt RAD 18 homolog in a 
genomic region harboring an avirulence gene [30]. Using 
PgtRAD18 as a reference, PT0313.J16.C21 (GenBank 
accession number GR497566) was identified from a Pt 
EST database using TBLASTN (E value = e-107; [31,32]) 
and used as a probe. Nine positive BAC clones were found 
and clone 1F16 (PtlF16) was selected for sequencing 
because of its longer length and the centralized location of 
PtRAD18 within the BAC clone. Sequences from PtlF16 
were assembled into two contiguous sequences of 39,219 
and 63,874 bp, totaling 103,093 bp (GenBank JX489506). 
The GC content of these sequences was 47%. Subclones 
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were generated spanning the gap for orientating and 
ordering of the two contigs. However, due to a region of 60 
near-perfect 46 bp repeats of ACCAGCCCGCCGAGAG 
GAAGCCCTCTCGGCGAGCTGGTGTGTGTAT, the gap 
could not be closed. FGENESH, with gene models from 
the Puccinia group genome project (http://www.broad 
institute.org/annotation/genome/puccinia_group/), pre- 
dicted 30 open reading frames (ORFs) ranging from 210 
to 4,077 bp in length. 

In a functional screen of Uromyces viciae-fabae, 
secreted peptide effector protein UF5 was related to the 
flax rust Melampsora lini haustorial-expressed secreted 
protein HESP-379 [16]. A Pgt genome search revealed 
several predicted secreted protein homologs in close 
proximity, suggesting the presence of small clusters of 
predicted secreted proteins. UF5 (Genbank ES608162) 
aligned with two predicted secreted proteins, PGTG_03708 
(E score 5 e~ 33 ) and PGTG_03709 (E score 0), both tran- 
scribed and located 513 bp apart on the Pgt contig. 
Using these Pgt sequences, PtContigl8 (Genbank accession 
HP451841) and PtContig7347 (Genbank accession 
HP458556) were identified by a BLASTN Pt EST data- 
base search. A PCR product from the cDNA clone, Pt 
EST PT0061b.D10.TB that aligned to Contigl8 (GenBank 
accession EC400508), was used as a probe to identify Pt 
BAC PtHSP02. Sequencing of this BAC resulted in four 
assembled contigs. Gaps could be spanned and thus the 
contigs could be ordered and oriented. Sizes of the con- 
tigs in bp were 16,991, 30,055, 5,014, and 60,277 for a 
total of 112,337 bp (GenBank JX489507). Gaps were 
present in regions of repeated DNA and could not be 
assembled. GC content was 46.3% and FGENESH pre- 
dicted 31 ORFs in the contig ranging from 174 bp to 
7,167 bp in length. The smaller ORFs were generally 
within repeated elements. 

The bean rust effector UfHSP42c UfOll (GenBank 
ES608167; [33]) matched three predicted protein sequences 
in Pgt, PGTG_17547 (E Score 0), PGTG_17548 (E Score 1 
e 21 ) and PGTG_17549 (E Score 1 e' 4 ). UfHSP42c matched 
five Pt ESTs, including clone PT0131d.B10.BR (GenBank 
accession EC414978) from which probes were derived to 
identify Pt BAC clone HSP04. Sequencing of HSP04 pro- 
duced two contiguous sequences of 9,276 bp and 
157,027 bp for a total of 166,303 bp (GenBank 
JX489508). GC content was 46.3% and 61 ORFs were 
predicted ranging from 120 bp to 5,214 bp in length. 

BAC annotation 

The predicted ORFs from each BAC clone were aligned 
using BLASTN to the Pgt genome, Pgt predicted 
transcripts and Pt ESTs, and using BLASTX, to the Pgt, 
Mlp, and U. maydis (Um) predicted proteomes (Table 1). 
PtlF16 had nine ORFs with synteny in Pgt. Identity 
across the protein sequences ranged from 37-87% in 



these alignments and putative annotations could be 
assigned to five of the proteins. PtlF16-4 contained 
many gaps when compared to PGTG_13013. Proteins 
PtlF16-5, 6, 7, 8 and 9 aligned with two proteins each 
from Pgt. PtlF16-7 aligned with PgtRAD18, which has 
one copy in each of the Pgt haplotype genomes. All but 
one homolog could also be found in Mlp and four were 
represented in Um (Table 1). 

Nine predicted proteins in PtHSP02 were confirmed 
through EST sequence alignment [32] and a putative 
function could be assigned to eight of them. Alignment 
identity ranged from 30-100% in PtHSP02. Eight homologs 
could be found in both Mlp and Um in PtHSP02. The most 
highly conserved protein is PtHSP02-6, a G-protein fi- 
subunit containing a conserved WD-40 repeat motif. The 
first 343 amino acids were 100% identical to PGTG_03727 
and 99% to Mlp accession GL883091 (Table 1). Conversely, 
PtHSP02-3 was only 30% identical to PGTG_3706 and had 
no homologs in the other two fungi. PtHSP02-4 and 
PtHSP02-5 aligned with Mlp HESP-379, the haustorial 
expressed predicted secreted protein homolog from M. lini, 
and a homolog was found for each in Pgt (Table 1, Figure 1). 
Two insertions/deletions were found in PtHSP02-4 and 
PGTG_3708 (Figure 1A). PtHSP02-5 and PGTG_3709 
aligned to homologs from M. lini, Mlp, M. medusae 
deltoidis, and U. maydis. The N-terminal half of the 
protein was conserved between Puccinia and Melampsora 
(Figure IB). There appeared to be 48 genus-specific 
amino acid changes across the protein. Um was the most 
diverged with only a few conserved motifs. 

Fourteen predicted proteins were identified in PtHSP04 
and could be supported through EST sequence alignment. 
Every protein had a homolog in Pgt with protein identities 
ranging from 26-95% (Table 1); nine could be assigned a 
putative function. Eight PtHSP04 proteins had homologs 
in Mlp and five in Um. PtHSP04-l, 5, and 14 appeared to 
be unique to Pt with little homology to Pgt. The predicted 
transcripts of PtHSP04-6, 7, 8 and 9 aligned to a single 
EST of P. striiformis predicted to encode a secreted 
protein (ADA54575; [34]) at scores of 4 e' 5 , 2 e~ 8 , 6 e~ 48 , 
and 3 e' 9 , respectively. PtHSP04-6 and 7 aligned both to 
PGTG_17549, though revealing 26 and 60% identity, 
respectively. The predicted HSP04-7 ORF is 1,095 bp in 
length and contains a 3' in-frame repeat of nine nucleo- 
tides, GG(C/T) AC(T/A) AC(T/A), translating to 30, three 
amino acid repeats of Gly-Thr-Thr. Without the repeat, 
PtHSP04-7 is a homolog to PGTG_17549, while PtHSP04- 
6 is unique to Pt. PtHSP04-8 and 9 are responsible for the 
homology to Uf-HSP42c and isolation of the BAC clone 
(Figure 2, Table 1). They are very highly identical except 
for the C-terminal 18 amino acids, where PtHSP04-9 has a 
five amino acid deletion and only four identities (Figure 2). 
Each aligned to PGTG_17547 and PGTG_17548, adjacent 
proteins which themselves are 100% identical. PtHSP04-8 
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Table 1 Gene features in three Puccinia triticina BAC clones and their alignment to other sequenced Basidiomycetes, 
P. graminis tritici (Pgt), Melampsora larici-populina (Mlp), and Ustilago tnaydis (Um) 



P. triticina Pgt Gencxxe Pgt protein Mlp Um BLASTX 

Fcsturfi* 

ORF identities gene feature** gene feature Annotation 



1 F16-1 


PGTGJ 2990 


54% 


35.90208/1 1 




Similar to Uhrfl 


1 F16-2 


PGTGJ3012 


60% 


104.11397 


Um_00786 


Hypothetical protein 


1 F16-3 


PGTGJ 301 3 


63% 


74. 94858 




Similar to esterase 


1 F16-4 


PGTGJ 301 6 
PGTGJ 8731 


37% 
64% 






Predicted protein 


1 F16-5 


PGTGJ 301 8 
PGTGJ 8732 


67% 
68% 


49.39112 


Um_02725 


Similar to molybdopterin 
synthase su pherylase 


1 F16-6 


PGTGJ 3021 
PGTGJ 8735 


68% 
64% 


74.73436 


- 


Hypothetical protein 


1 F16-7 


PGTGJ 3023 
PGTGJ 8741 


56% 
56% 


74.94864 


Um_05085 


RAD 18 


1 F16-8 


PGTGJ 3024 
PGTGJ 8744 


64% 
65% 


74.000024 


- 


cystein rich SCP-like 
extrace lu ar protein 


1 F16-9 


PGTGJ 3026 
PG 1 G J 8746 


87% 
86% 


74.50754/28052 


Um_0Q594 


Similar to pyruvate 
dehydrogenase complex 


HSP02-1 


PG I G_3730/1 


79% 




Um_00736 


Conserved protein 


lj c n r\ -> n 

nbPu2-2 


PG 1 G_oo/z 


48% 




Um_042/0 


Aspartyl-tRNA synthetase 


HSP02-3 


PG 1 G_o /0b 


30% 








HSP02-4 


PG 1 G_3708 


69% 


2.70587 


1 1 ™, rii rrr 

Um_01 555 


Mlp HtSP-379 




PG 1 G_J/09 


83% 


2.70587 


1 1 . — nirrr 
Um_(Jl JJJ 


Mlp HtSr-3/y 




PG 1 G_3 11/ 


1 00% 


2.76428 


Um_0U/03 


G-protein beta subunit 


HSP02-7 


rv — rr~ mn 

PG 1 G J728 


68% 


2.1 1 5002 


Um_03486 


Nucleotide-binding protein 2 


urnm o 

HSP02-8 


PG 1 G_3/29 


82% 


2.4641 9 


Um_05/43 


Pre-mRNA splicing factor ATP- 

dependent 

RNA helicase PRP16 


HSP02-9 


PGTG_3730 


87% 


10.115914 


Um_04551 


Similar to cyclin Cffc2 


HSP04-1 


PGTGJ 6978 


36% 


- 


- 


Predicted protein 


HSP04-2 


PGTGJ 6976 


73% 


27.88522 


Um_00639 


Nucleoporin-like 


HSP04-3 


PGTGJ 0949 


95% 


47.72927 


Um_02479 


60S ribosomal protein 


HSP04-4 


PGTG_02586 


33% 


27.88520 


- 


Heat shock protein 90 


HSP04-5 


PGTGJ 4539 


41% 






Predicted protein 


HSP04-6 


PGTGJ 7549 


26% 






Predicted secreted protein 


HSP04-7 


PGTGJ 7549 


60% 






Predicted secreted protein 


HSP04-8 


PGTGJ 7547/8 


76% 


16.85997 




L/f-HSP42c 


HSP04-9 


PGTGJ 7547/8 


71% 


16.85997 




L/f-HSP42c 


HSP04-10 


PGTG_05205 


82% 


22.48630 




Integral membrane protein 


HSP04-1 1 


PGTGJ 7545 


52% 




Um_00662 


Predicted protein 


HSP04-12 


PGTGJ 7544 


83% 


23.87824 


Um_03820 


Vacuolar sorting protein PEP5 


HSP04-13 


PGTGJ 7543 


62% 


23.72100 


Um_02189 


Hypothetical protein 


HSP04-14 


PGTGJ 7537/8 


38% 






Predicted protein 



* Pgt gene features indicate gene nUmber assigned by the Broad institute during assembly (http://www.broadinstitute.org/annotation/genome/puccinia_group/ 
verified September 28, 2012). Two nUmbers indicate different scaffolds, as highlighted in Figure 1. 

** Mlp gene features are indicated by scaffold nUmber; gene nUmber as assigned by the Joint Genome Initiative at the Department of Energy (http://genomeportal. 
jgi-psf.org/Mellp1/Mellp1.home.html verified September 28, 2012). 
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Figure 2 ClustalW alignments of two predicted secreted proteins coded on BAC clone PtHSP04. PtHSP04-8 and 9 are aligned to 
homologs from Pgt and Uromyces fabae. 
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and 9 are 76% and 71% identical to PGTG_17547, 
respectively (Figure 2, Table 1). 

Repetitive elements and repeated sequences 

Each BAC was evaluated for repeat elements by using 
REPBASE against Pgt, Pt and Pst genomes. Complete 
and incomplete terminal inverted repeats (TIR), LTRs, 
Copia, Gypsy, Mariner, Mutator, Harbinger, Helitron, hAT, 
and DNA transposons were found. (Additional file 1: 
Tables SI and S2). Major insertions are represented in 
Figure 1. Copia elements were found inserted within 
Gypsy elements in PtlF16 and PtHSP02. PtHSP02 and 
PtHSP04 also had localization of LTRs. 

Synteny 

To investigate whether the high number of candidate 
orthologs with Pgt maintained the same gene order, the 
Pt BAC sequences were aligned to the available Pgt 
contig sequences. Figure 3 graphically represents the 
location along each BAC clone of Pt ORFs with EST 
sequence or protein homology support. The majority of 
PtlF16 aligned to the 325,000 bp to 415,000 bp region 
of Pgt scaffold (SC) 40 but also to the 5,000 to 65,000 bp 
region of PgtSCllO. PgtSC40 and PgtSCllO could either 
represent the two Pgt haplotypes or a duplication of this 
region in the genome. Overall, gene order was maintained 
in both scaffolds. As previously noted, eight of the PtlF16 
ORFs aligned to homologs in Pgt but PQF16-1 to 3 were 
found only on PgtSC40 (Table 1, Figure 3A). PtlF16-l 
aligned to PGTG_12990 85 kb upstream in SC40 of 
PGTG_13012 whereas PtlF16-2 and 3 were similarly 
spaced as their counterparts on this Pgt SC. Between 
PtlF16-4 and 5, four retrotransposons were found, of 



which one was similar to a retroelement in PgtSCllO. No 
mobile elements were found in this region on PgtSC40. 
PtRAD18 (PtlF16-7) is a single ORF while PUF16-8 
aligned to an ORF corresponding a cysteine rich SCP 
family protein in both SCs of Pgt. 

PtHSP02 aligned to a single scaffold, PgtSC7 (Figure 3B). 
A second haplotype was not detected as the Pgt assembly 
represents most loci with a single sequence [6]. Nine Pt 
ORFs could be aligned to homologs on PgtSC7 (position 
1,135,000 to 1,280,000). As with the other BAC clones, 
gene order was generally maintained. However, PtHSP02-l 
and PtHSP02-2 were found embedded between retroele- 
ments and LTRs. While PtHSP02-l aligned to two 
fragments on PgtSC7, PtHSP02-2 was 48% homologous to 
a gene on PgtSC15 elsewhere in the genome. The 
remaining genes in PtHSP02 were in the same order as on 
PgtSC7, except a large insertion of approximately 70 kB of 
DNA, including sequence similar to mobile elements, was 
found between PGTG_03709 and PGTG_03708 on 
PgtSC7. Additional PgtSC7 DNA insertions were evident 
within this gene cluster whereas the Pt homologs were 
packed in a tighter arrangement. Across this region, a 
higher number of retrotransposon elements were found on 
PtHSP02 (Additional file 1: Table SI). 

PtHSP04 aligned to at least six regions within the Pgt 
genome and represents the least syntenic sequence 
amongst the three BAC clones (Figure 3C). PtHSP04-l 
and 2 were found on PgtSC84, however, there were 
several repeat elements within both the Pt and Pgt 
regions. PtHSP04-3 appeared to be a fragmented ORF 
because a single homologous ORF was found on 
PgtSC35. PtHSP04-4 and 5 were found on two separate 
scaffolds, PgtSC4 and PgtSC48, respectively. PtHSP-6, 7, 
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Figure 3 Graphical representation of three BAC clones from P. triticina PHF16 (A), PtHSP02 (B) and PtHSP04 (C) and their synteny to 
super contigs (SC) of P. graminis tritici {Pgt). Lines connect homologs between genomes. 



8, and 9 have homologs on Pgt SC89 in the same order 
and similar gene distance (conserved micro-synteny). 
PtHSP04-10, flanked by an LTR and Harbinger element, 
does not have a homolog on PgtSC89, but on PgtSC13. 
Microsynteny of PtHSP04-ll, 12, and 13 to Pgt is main- 
tained. PtHSP04-14 is a single copy gene in Pt but is 
repeated in Pgt. between BAC positions 60,000 and 125,000 
there are a high number repeat elements. 



One of the most interesting sets of sequences were Pt 
ORFs for which numerous homologous copies were 
found in the Pgt genome but were not classified as 
typical mobile elements (identified by small letters in 
Figure 3; Table 2). Twenty of these ORFs had repeats in 
the Pgt genome numbering from 19 to 474. Table 2 lists 
the conserved amino acid domains, if present, in each of 
the ORFs and the percent identity, which, ranged from 



Table 2 Detailed analysis of non-transposable element, repeated sequences in three P. triticina BAC clones 



ORF 


Size bp 


Pgt repeats 


Conserved domain 


% Ident Exp* 


Dominant peptide 


Notes 


1 F16-a 


757 


35 


i n A A A 1 

1 94-443 


51 


no 


n ""in/ i , lr - 

9.2% Lys 


216-252-40.3% CI Winged helix DNA/KNA binding 


1 F16-b 


458 


98 


97-458 


40 


no 


I I .o% Lys 


Highly helical 


1 F16-C 


462 


98 


all 


40 


no 


1 d.z% Lys 


o-9l-ou.3% LI ubiquitin ligase 


1 F-1 6-d 


489 


44 


~) A A AO r 

344-485 


48 


no 


1 n nn/ 1 , 

10.8% Lys 


3 alpha helices and 7 beta sheets in conserved domain 














1 3.0% Ser 




1 F16-f 


651 


80 


5 1 -245 


74 




9.2% Lys 


38-163-88./% LI oxidoreductase 


HSP02-a 


658 


19 


all 


31 


n o 


9.7% Lys 




HSP02-b 


299 


80 


35-94 


68 


no 


9.4% Lys 


32-73-80% CI metal binding protein 














1 1 .7% Ser 


145-256-80% CI protease hydrolase inhibitor 


HSP02-C 


252 


50 


33-127 


52 


ves 


1 1.1% Lys 


33-95 53% CI DNA binding domain 














1 r\ "7n/ f I . . 

10.7% blu 




HSP02 


243 


35 


all 




VPS 


1 1 .9% Ala 


Alignment in Pgt are to DNA, not protein 














1 ~*i r r n/ c ~ „ 

1 3.6% ber 
















1 1.5% Thr 




HbPU4-a 


952 


74 


256-470 


48 


no 


n nn/ a 1 

9.9% Ala 




HSP04-D 


484 


76 


300-454 


74 


no 


none 




HSP04-C 


442 


69 


^ ii 
all 


34 


yes 


n nn/ 1 , 

9.0% Lys 




HSP04-d 


679 


81 


10-351 


68 


no 


9.3% Lys 


11 alpha helices in conserved region 


HSP04-e 


420 


131 


55-386 


55 


VPS 


9.5% ala 


216-388-51% identical in Mlp 


i i c nn a C 

HSP04-T 


262 


4/4 


20-1 89 


44 


no 


1 1 .5% ser 


EST sequence hits 


HbrU4-g 


543 


96 


a 


69 


yes 


11 AO/ I wr- 

I 1 .470 Lys 


High y he ica 














9.0% ser 




HSP04-h 


262 


66 


1 99-258 


65 


yes 


12.6% Lys 


P-loop NTPase/DEAD box 


HSP04-i 


326 


20 


1 27-242 


34 


yes 


1 2.8% Lys 
11.0% Glu 


Highly helica 


HSP04-J 


309 


69 


191-303 


35 


no 


9.7% Lys 


2-191-99.7% CI Recombinant DNA binding/ RAD54 like 














10.7% Ser 




Each was evaluated for number of repeated homologs in the Pgt genome, region of conservation, alignment identity, expression in infected tissue, the dominant peptides, and their best detected fit to a protein 
structural model. 

* Exp - expression in infected tissue 6 dpi. 

** PHYRE 2.0 confidence interval (CI) of protein segment matching a structural model with the listed function [35]. 
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34-74%. Each ORF was compared to an RNAseq cDNA 
library of P£-infected leaf tissue (Fellers and Bruce, 
unpublished) and nine aligned to the experimental 
cDNA sequences. The predicted proteins were analyzed 
for peptide content and most had an abundance of 
Lys, which is suggestive of helical structures. Each of 
the proteins was also compared to the PHYRE 2.0 
structural data base [35] resulting in seven that 
revealed regions that aligned, with confidence, to 
known structures. The first 191 peptides of PtHSP04-j 
had a structure similar to RAD54, with 99.7% confidence. 
Of note, PtHSP04-e was expressed and was 51% identical 
to a protein in Mlp. 

Discussion 

This study was performed to look at three regions of the 
Pt genome that were hypothesized to be under selection 
pressure because of the presence of putative secreted 
proteins or loci associated with avirulence. To begin 
with, gene order is conserved between the Pt BACs and 
Pgt. However, there is a wide range of protein conserva- 
tion. A previous comparison of ESTs of Pt and Pgt found 
a similar level of variation in sequence, but only 40% of 
the Pt EST unigenes had orthologs in Pgt [32]. Many 
genes were likely missing in the unigene set because of 
the difficulty of sampling other Pt life stages to sufficient 
depth, affecting the percentage. Nevertheless, within the 
BAC clones, many protein identities were supported by 
ESTs and similar sequence variation was present [32]. 
Some proteins were highly conserved between the two 
wheat rust fungi and had homologs in Mlp and Um [36]. 

The three genes used for identifying the BACs were of 
most interest, in particular, the amount of variation 
within the sequence. PgtRAD18 had been associated 
with an avirulence locus in Pgt [30]. PtRAD18 protein 
length is relatively similar but the sequence has diverged 
from the PgtRAD18 with only 56% identity. Structurally, 
PtRAD18 is still closely associated with a predicted 
secreted protein. Pt has two genes similar to HESP-379 
from M. lini [16]. Two indels in PtHSP02-4 suggest a 
recombination event or splicing difference evolved since 
the two species diverged, while the sequence differences 
in the C-terminus of PtHSP02-5 suggest that this region 
could be very variable. PtHSP04 contained a four-gene 
locus predicted to code for secreted proteins. Two of 
them are unique while two are recently duplicated paralogs. 
Secreted proteins are believed to be most variable amongst 
fungal proteins because they are under the highest selection 
pressure to avoid recognition by the host [16,19,37]. At 
least with these examples, It can be said that sequence 
variation, recombination, and duplication are driving the 
changes in these proteins. 

Numerous fungal genomes have recently been gener- 
ated, analyzed, and published. Now comparisons can be 



made to find core gene families associated with specific 
life styles and cycles. In an extensive comparison, 
Duplessis et al. [6] identified core conserved genes 
needed for biotrophic life in both rust species. It appears 
that PtHSP02-6 may be one of those genes. PtHSP02-6 
aligns with a G-protein beta subunit (GPBS) and no 
peptide differences were found between Pt and Pgt. 
Furthermore, there is little difference between Pt and 
Mlp suggesting that this protein is under strong 
purifying selection in rusts. Yet, the genes flanking 
PtHSP02-6 are relatively conserved indicating strong 
selection and the importance of this gene. In Verticillium 
dahliae, mutations in GPBS had reduced virulence, 
increased microsclerotia and conidiation and decreased 
ethylene production [38]. GPBS is also involved in 
similar functions in F. oxysporum [39]. In M. grisea, 
GPBS mutants could not form appresorium, and hy- 
phae could not penetrate and grow in rice leaves 
[40]. The authors also showed that by over expressing 
GPBS in the fungus, appressorium could form on a 
hydrophillic surfaces suggesting that GPBS is neces- 
sary for control of surface recognition, growth and 
appressorium formation [40]. Surface recognition and 
appressorium formation are the key to rust fungal 
establishment. This suggests that PtHSP02-6 is indis- 
pensable for the biotrophic lifecycle and could be a 
regulating link in pathogenicity. 

A strong correlation between genome size and repeti- 
tive element content has been found for many fungal 
genomes. Genome expansion is significant between Pt 
and Pgt, even though they are both closely related and 
are both dikaryotic. The assembled genome for Pgt is 
89 Mb [6] while Pt is currently estimated to be 135 Mb 
(Broad Institute). The sequence analysis of the three 
BAC clones gives some indication on why the Pt genome 
may be larger than the Pgt genome. PtlF16 had the least 
mobile element complexity, but had Gypsy elements 
within Copia elements, as did PtHSP02. PtHSP02 also 
harbored numerous TEs and LTRs in the region between 
PtHSP02-l and 3. Meanwhile, PtHSP04 contains more 
non-TE repeat ORFs, its homologous genes are scattered 
across Pgt scaffolds, and its sequence reveals recombination 
and/or transposition events disrupting syntenic genes. 
There is also evidence of gene movement by active 
elements. PtHSP02-2 was directly flanked by LTRs and 
was not found in PgtSC7, PtHSP04-5 was also flanked by 
LTRs and could be found in PgtSC48, and PtHSP04-10 
only had a single LTR flanking it, but was flanked on the 
opposite side by a partial Harbinger element. It is possible 
that since these regions are in repetitive sequence there 
are assembly errors in Pgt, however, each Pgt homolog 
are in high confidence scaffolds. 

Most surprising are the non-transposable element, 
repeated sequences found in the Pt BACs (Table 2). Each 
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had homologs throughout the Pgt genome. Most had 
conserved domains that were maintained, while flanking 
sequences were greatly diverged. Many were high in Lys 
suggesting a helix protein structure. Some are expressed, 
based on the presence of an aligning EST, and have 
homologs in Mlp, suggesting an importance. The helical 
nature of these proteins would suggest their involvement 
as nucleotide binding elements. Pt has five different 
spore types in its lifecycle involving two different hosts 
requiring a significant level of cell modifications and cell 
types. Sequences like these have not been described 
before and could represent undiscovered elements in the 
disease cycle. 

This work has shown significant genome synteny 
between two closely related wheat rust fungi. Gene 
sequences confirmed previous findings of the existence 
of EST sequence variation between Pt and Pgt. Various 
levels of homologies are present, but many of the genes 
are diverging in a manner that is species specific [32]. 
Both genomes have a significant amount of mobile 
elements. Some TE copies are conserved between the 
two species suggesting ancestral insertion. The insertion 
of TE sequences helps explain genome expansion, and 
their insertion near secreted protein genes may alter 
their regulation or cause their duplication and spread or 
deletion. Most surprising was the presence of small 
predicted non-TE genes with numerous homologs in 
Pgt. As many of the small repeated sequences are highly 
helical in predicted structure, one could suggest they are 
involved in DNA binding and regulation. Further work 
is needed to determine when they are expressed and at 
what stage of the life cycle. When analysis of the Pt and 
Pst genomes has been concluded, it can be determined if 
the repeated nature of these predicted genes is maintained 
within the wheat rust fungi. 

Methods 

Pt BAC library 

Total genomic DNA for the BAC library construction 
was isolated from P. triticina (Pt) Racel, BBBD [41] 
urediniospores collected from susceptible wheat (Triticum 
aestivum L.) cultivar Thatcher. Spores were increased on 
plants spray-inoculated with a urediniospore suspension in 
light mineral oil (Soltrol 170 isoparafhn, Conoco-Phillips 
Chemical Co, Borger TX). The oil was allowed to evaporate 
for 30 min, then plants were moved to a dark dew chamber 
at 20°C and 100% relative humidity for 24 hrs for uredinios- 
pore germination and appressorium formation. Plants were 
grown in a growth chamber under 16-hour day at 20°C. 
After 10 days, urediniospores were collected and germi- 
nated by densely dusting them over sterile water in dishes 
for 8 hrs using a volatile nonanol solution (1.56 ul nonanol 
(Sigma-Aldrich, St. Louis MO), 1 ml acetone, 19 ml of 
ddH 2 0) spotted on filter paper which was suspended in the 



lids to stimulate urediniospore germination under crowded 
conditions. The BAC library was constructed by BioS&T 
(Montreal, Quebec, Canada; www.biost.com). In brief, nu- 
clei were isolated from collected germinated urediniospores 
and embedded in 1% low melting point agarose plugs. Total 
genomic DNA embedded in the plugs was partially digested 
with Hindlll, separated by electrophoresis by pulse field gel 
electrophoresis, and the 100-200 kb region was isolated. 
After electro-elution and dialysis, the DNA fragments were 
cloned into the Hindlll site of BAC vector pIndogoBAC5 
(Epicenter Technologies, Madison, WI) and propagated in 
E. coli DH10B (Life Technologies, Grand Island, NY). 

BAC clone selection and sequencing 

The resulting BAC library of 15,360 individual clones 
was arrayed on nylon membranes. After colony lysis, 
DNA was bound to the membranes using standard 
procedures [41]. BAC filters were probed to identify 
clones for sequencing. Several candidate fragments were 
selected as probes. The Sfil insert from a Pt cDNA clone, 
PT0313.J16.C21 (GenBank accession GR497566; [32]) was 
labeled with aP J -dCTP using a random primer labeling kit 
(GE Heathcare, Pittsburg, PA). Positive BAC clones were 
verified by PCR using primers Forward 5'-AGCTCTTCAC 
ACGATTCC and Reverse 5'-ATCTTGGCATTGAGC 
ATC. The second probe, SP02, was amplified from Pt 
cDNA clone PT0061b.D10.TB (GenBank accession 
EC400508) by PCR using primers Forward 5'- CTTTCTA 
GACCTAGGCAACTTAACAC and Reverse 5'- GCGCC 
ATGGACTAGTTGAAGAGGGA. The third probe, SP04 
was amplified from cDNA clone, PT0131d.B10.BR 
(GenBank accession EC414978) using PCR primers 
Forward 5'-CACGAGGGGAACCGATGGGGGT and 
Reverse 5'-TGGGTTGGTAAACTATTAATGTGCAC. 
Southern hybridizations were as described [41] 

Selected BAC clones were sent as a stab culture to the 
Genome Center at Washington University, St. Louis, MO. 
BAC clones were cultured, subcloned, shot gun sequenced, 
and assembled (Washington University Genome Center, 
St. Louis, MO). Gene calls were made using FGENESH 
with gene models specific to Puccinia (http://linuxl. 
softberry.com/berry.phtml;). BAC clone gene predictions 
were compared to Pgt, Mlp and Um genomic resources 
(http:/ /www.broadinstitute.org/ annotation/ genome/ puccinia 
_group/Blast.html; http://genomeportal.jgi-psf.org/Mellpl/ 
Mellpl.home.html verified May 7, 2012 and http://www. 
broadinstitute.org/annotation/genome/ustilago_maydis) 
using the BLASTN and BLASTX algorithms with settings 
of E value = le 3 , Matrix = BLOSUM62, and gapped 
alignment. Repeats were identified using fungaldb of 
RepBase 17.04 [42], containing the repeats of Pt, Pgt and 
Pst. Long terminal repeats (LTR) were determined by 
LTR_Finder [43] (e.g., red arrows in Figure 3). 
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